Using SOFM to Improve Web Site Text Content
نویسندگان
چکیده
We introduce a new method to improve web site text content by identifying the most relevant free text in the web pages. In order to understand the variations in web page text, we collect pages during a period. The page text content is then transformed into a feature vector and is used as input of a clustering algorithm (SOFM), which groups the vectors by common text content. In each cluster, a centroid and its neighbor vectors are extracted. Then using a reverse clustering analysis, the pages represented by each vector are reviewed in order to find the similar. Furthermore, the proposed method was tested in a real web site, proving the effectiveness of this approach.
منابع مشابه
Web site keywords: A methodology for improving gradually the web site text content
The construction of a web site is a great challenge that integrates different elements such as the hyperlink structure, colors, pictures, movies and textual contents. In the latter, the correct textual content can be the key to attracting users to visit the site. In fact, many users visit a web site by using a web search engine such as, for example, Google or Yahoo!, and continue exploring the ...
متن کاملIdentifying Keywords to Improve a Web Site Text Content
The steadily increasing competition of internet web sites makes it both more difficult and more important to attract and retain users. However, it is not always possible to determine beforehand which content is most appropriate to reach this goal, since the behavior and requirements of users can be heterogeneous and changing over time. In order to improve a web site text content, it is necessar...
متن کاملData Extraction using Content-Based Handles
In this paper, we present an approach and a visual tool, called HWrap (Handle Based Wrapper), for creating web wrappers to extract data records from web pages. In our approach, we mainly rely on the visible page content to identify data regions on a web page. In our extraction algorithm, we inspired by the way a human user scans the page content for specific data. In particular, we use text fea...
متن کاملA Framework for Summarization of Multi-topic Web Sites
Web site summarization, which identifies the essential content covered in a given Web site, plays an important role in Web information management. However, straightforward summarization of an entire Web site with diverse content may lead to a summary heavily biased to the dominant topics covered in the target Web site. In this paper, we propose a two-stage framework for effective summarization ...
متن کاملDeveloping a Web site in primary care.
BACKGROUND AND OBJECTIVES While content, navigability, and usability are essential qualities of effective Web sites, the health care literature contains limited discussion of these issues. This article describes how knowledge gained through focus groups, Web site searches, and individual interviews were used to develop and improve a health-related Web site. METHODS We conducted 10 focus group...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005